AMENDMENTS TO THE CLAIMS 

This listing of claims will replace all prior versions, and listings, of claims in the 
application: 
Listing of Claims: 

1 . (currently amended) A fact extraction tool set for extracting information from a 
document, wherein the document includes text, comprising: 

means for annotating a-the text with regular, expression-based attributes and with 
tree-based attributes ; and 

means for extracting facts from the annotated text using pattern recognition rules 
using regular, expression-based functionality, tree-based functionality, and auxiliary definitions 
in any combination . 

2. (canceled) 

3. (currently amended) The fact extraction tool set of claim 31, wherein the means 
for assigning syntactic and s e mantic attribut e s to a t e xt passag e annotating the text comprises 
means for breaking the text passag e into its base tokens and annotating the base tokens and 
patterns of base tokens with a number of orthographic, syntactic, semantic, pragmatic and 
dictionary-based attributes. 

4. (currently amended) The fact extraction tool set of claim 3, wherein the attributes 
include tokenization, text normalization, part of speech tags, sentence boundaries, parse trees, 
and semantic attribute tagging and oth e r inter e sting attribut e s of th e t e xt . 
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5. (currently amended) The fact extraction tool set of claim-21, wherein the means 
for assigning syntactic and s e mantic attributes to a text passag e annotating the text comprises 
independent annotators. 

6. (original) The fact extraction tool set of claim 5, wherein the independent 
annotators use XML as a basis for representing annotated text. 

7. (original) The fact extraction tool set of claim 6, further comprising means for 
resolving conflicting annotation boundaries in the annotated text to produce well-formed XML 
from the results of independent annotators. 

8. (currently amended) The fact extraction tool set of claim 3, wherein the means for 
breaking the text passag e into its base tokens and annotating the base tokens and patterns of base 
tokens comprises independent annotators, wherein the annotators are of three types comprising: 

token attributes, which have a one-per-base-token alignment, where for the 
attribute type represented, there is an attempt to assign an attribute to each base token; 

constituent attributes assigned yes-no values to patt e rns of bas e tok e ns , where the 
entire pattern of each base token is considered to be a single constituent with respect to some 
annotation value; and 

links, which assign common identifiers to coreferring and other related patterns of 

base tokens. 



9. (original) The fact extraction tool set of claim 3, wherein the means for 
annotating a text further comprises means for associating all annotations assigned to a particular 
piece of text with the base tokens for that text to generate aligned annotations. 

10. (original) The fact extraction tool set of claim 9, wherein the means for extracting 
facts comprises means for identifying and extracting potentially interesting pieces of information 
in the aligned annotations by finding patterns in the attributes stored by the annotators. 

11. (original) The fact extraction tool set of claim 10, wherein the means for 
identifying and extracting potentially interesting pieces of information comprises means for 
recognizing both true left and right constituent attributes and non-contiguous constituent 
attributes. 

12. (original) The fact extraction tool set of claim 10, wherein the means for 
identifying and extracting potentially interesting pieces of information comprises at least one text 
pattern recognition rule written in a rule-based information extraction language, wherein the at 
least one text pattern recognition rule queries for at least one of literal text, attributes, and 
relationships found in the aligned annotations to define the facts to be extracted. 

13. (canceled) 



-6- 



14. (original) The fact extraction tool set of claim 12, wherein the at least one text 
pattern recognition rule comprises a pattern that describes the text of interest, a label that names 
the pattern for testing and debugging purposes; and an action that indicates what should be done 
in response to a successful match. 

15. (original) The fact extraction tool set of claim 12, wherein the means for 
identifying and extracting potentially interesting pieces of information further comprises at least 
one auxiliary definition statement used to name and define a fragment of a pattern. 

16. (currently amended) A rule-based information extraction language for use in 
identifying and extracting potentially interesting pieces of information in aligned annotations in a 
text, comprising at least one text pattern recognition rule that queries for at least one of literal 
text, attributes, and relationships found in the aligned annotations to define the facts to be 
extracted , using regular expression functionality, tree-based functionality, and auxiliary 
definitions in any combination . 

17. (canceled) 

18. (original) The language of claim 16, wherein the at least one text pattern 
recognition rule comprises a pattern that describes the text of interest, a label that names the 
pattern for testing and debugging purposes, and an action that indicates what should be done in 
response to a successful match. 
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19. (original) The language of claim 16, further comprising at least one auxiliary 
definition statement used to name and define a fragment of a pattern. 

20. (currently amended) A text annotation tool comprising: 

means for assigning syntactic and s e mantic attribut e s to a t e xt passag e by at l e ast 
on e of parsing th e t e xt passag e and applying t e xt annotation proc es s e s other than parsing th e t e xt 
passag e , including means for breaking the-text passag e into its base tokens; and 

means for annotating the base tokens and patterns of base tokens with a numb e r of 
orthographic, syntactic, semantic, pragmatic and dictionary-based attributes , wherein the 
attributes include regular, expression-based attributes and tree-based attributes ; and 

means for associating all annotations assigned to a particular piece of text with the 
base tokens for that particular piece of t ext to generate aligned annotations. 

21. (currently amended) The text annotation tool of claim 20, wherein the attributes 
include tokenization, text normalization, part of speech tags, sentence boundaries, parse trees, 
and semantic attribute tagging and oth e r int e r e sting attributes of th e t e xt . 

22. (currently amended) The text annotation tool of claim 20, wherein the means for 
assigning syntactic and s e mantic attribut e s to a t e xt passag e annotating comprises independent 
annotators. 

23. (original) The text annotation tool of claim 22, wherein the independent 
annotators use XML as a basis for representing annotated text. 
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24. (original) The text annotation tool of claim 23, further comprising means for 
resolving conflicting annotation boundaries in the annotated text to produce well-formed XML 
from the results of independent annotators. 

25. (currently amended) The text annotation tool of claim 20, wherein the means for 
br e aking th e t e xt passag e into its ba se tok e ns and annotating the base tokens and patterns of base 
tokens comprises independent annotators, wherein the annotators are of three types comprising: 

token attributes, which have a one-per-base-token alignment, where for the 
attribute type represented, there is an attempt to assign an attribute to each base token; 

constituent attributes assigned yes-no values to patt e rns of bas e tok e ns , where the 
entire pattern of each base token is considered to be a single constituent with respect to some 
annotation value; and 

links, which assign common identifiers to coreferring and other related patterns of 

base tokens. 
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26. (currently amended) A computer program product for extracting information from 
a document, wherein the document includes text, the computer program product comprising a 
computer usable storage medium having computer readable program code means embodied in 
the medium, the computer readable program code means comprising: 

computer readable program code means for annotating a -the text with regular, 
expression-based attributes and with tree-based attributes ; and 

computer readable program code means for extracting facts from the annotated 
text using pattern recognition rules using regular, expression-based functionality, tree-based 
functionality, and auxiliary definitions in any combination . 

27. (canceled) 

28. (currently amended) The computer program product of claim 3726, wherein the 
computer readable program code means for assigning syntactic and s e mantic attributes to a t e xt 
passag e annotating the text comprises computer readable program code means for breaking the 
text pas s ag e into its base tokens and annotating the base tokens and patterns of base tokens with 
a number of orthographic, syntactic, semantic, pragmatic and dictionary-based attributes. 

29. (currently amended) The computer program product of claim 28, wherein the 
attributes include tokenization, text normalization, part of speech tags, sentence boundaries, 
parse trees, and semantic attribute tagging and oth e r interesting attribut e s of th e t e xt . 



- 10- 



30. (currently amended) The computer program product of claim 2726, wherein the 
computer readable program code means for assigning syntactic and s e mantic attribut e s to a 
annotating the text passag e comprises independent annotators. 

31. (original) The computer program product of claim 30, wherein the independent 
annotators use XML as a basis for representing annotated text. 

32. (original) The computer program product of claim 31, further comprising 
computer readable program code means for resolving conflicting annotation boundaries in the 
annotated text to produce well-formed XML from the results of independent annotators. 

33. (currently amended) The computer program product of claim 28, wherein the 
computer readable program code means for br e aking th e t e xt passag e into its bas e tok e ns and 
annotating the base tokens and patterns of base tokens comprises individual annotators, wherein 
the annotators are of three types comprising: 

token attributes, which have a one-per-base-token alignment, where for the 
attribute type represented, there is an attempt to assign an attribute to each base token; 

constituent attributes assigned yes-no values to patt e rns of bas e tok e ns , where the 
entire pattern of each base token is considered to be a single constituent with respect to some 
annotation value; and 

links, which assign common identifiers to coreferring and other related patterns of 

base tokens. 



34. (currently amended) The computer program product of claim 28, wherein the 
computer readable program code means for annotating a text further comprises computer 
readable program code means for associating all annotations assigned to a particular piece of text 
with the base tokens for that particular piece of text to generate aligned annotations. 

35. (original) The computer program product of claim 34, wherein the computer 
readable program code means for extracting facts comprises computer readable program code 
means for identifying and extracting potentially interesting pieces of information in the aligned 
annotations by finding patterns in the attributes stored by the annotators. 

36. (original) The computer program product of claim 35, wherein the computer 
readable program code means for identifying and extracting potentially interesting pieces of 
information further comprises computer readable program code means for recognizing both true 
left and right constituent attributes and non-contiguous constituent attributes. 

37. (original) The computer program product of claim 35, wherein the computer 
readable program code means for identifying and extracting potentially interesting pieces of 
information comprises at least one text pattern recognition rule written in a rule-based 
information extraction language, wherein the at least one text pattern recognition rule queries for 
at least one of literal text, attributes, and relationships found in the aligned annotations to define 
the facts to be extracted. 

38. (canceled) 
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39. (original) The computer program product of claim 37, wherein the at least one 
text pattern recognition rule comprises a pattern that describes the text of interest, a label that 
names the pattern for testing and debugging purposes, and an action that indicates what should 
be done in response to a successful match. 

40. (original) The computer program product of claim 37, wherein the computer 
readable program code means for identifying and extracting potentially interesting pieces of 
information further comprises at least one auxiliary definition statement used to name and define 
a fragment of a pattern. 

41. (currently amended) A method of extracting information from a document, 
wherein the document includes text, comprising the steps of: 

annotating a -the text with regular, expression-based attributes and with tree-based 

attributes ; and 

extracting facts from the annotated text using pattern recognition rules using 
regular, expression-based functionality, tree-based functionality, and auxiliary definitions in any 
combination . 

42. (canceled) 
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43. (currently amended) The method of claim [[42]]_41, wherein the parsing of the 
text passag e comprises breaking it into its base tokens and annotating the base tokens and 
patterns of base tokens with a number of orthographic, syntactic, semantic, pragmatic and 
dictionary-based attributes. 

44. (original) The method of claim 43, wherein the attributes include tokenization, 
text normalization, part of speech tags, sentence boundaries, parse trees, semantic attribute 
tagging and other interesting attributes of the text. 

45. (currently amended) The method of claim [[42]]_41, wherein the parsing of the 
text passag e is carried out by independent annotators. 

46. (original) The method of claim 45, wherein the individual annotators use XML as 
a basis for representing annotated text. 

47. (original) The method of claim 46, further comprising the step of resolving 
conflicting annotation boundaries in the annotated text to produce well-formed XML from the 
results of independent annotators. 
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48. (currently amended) The method of claim 43, wherein the step of breaking the 
text passag e into its base tokens and annotating the base tokens and patterns of base tokens is 
carried out using independent annotators, wherein the annotators are of three types comprising: 

token attributes, which have a one-per-base-token alignment, where for the 
attribute type represented, there is an attempt to assign an attribute to each base token; 

constituent attributes assigned yes-no values to patt e rns of bas e tok e ns , where the 
entire pattern of each base token is considered to be a single constituent with respect to some 
annotation value; and 

links, which assign common identifiers to coreferring and other related patterns of 

base tokens. 

49. (original) The method of claim 43, wherein the step of annotating a text further 
comprises the step of associating all annotations assigned to a particular piece of text with the 
base tokens for that text to generate aligned annotations. 

50. (original) The method of claim 49, wherein the step of extracting facts comprises 
identifying and extracting potentially interesting pieces of information in the aligned annotations 
by finding patterns in the attributes stored by the annotators. 

51. (original) The method of claim 50, wherein the step of identifying and extracting 
potentially interesting pieces of information comprises recognizing both true left and right 
constituent attributes and non-contiguous constituent attributes. 
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52. (original) The method of claim 50, wherein the patterns are found using at least 
one text pattern recognition rule written in a rule-based information extraction language, wherein 
the at least one text pattern recognition rule queries for at least one of literal text, attributes, and 
relationships found in the aligned annotations to define the facts to be extracted. 

53. (canceled) 

54. (original) The method of claim 52, wherein the at least one text pattern 
recognition rule describes the text of interest, names the pattern for testing and debugging 
purposes; and indicates what should be done in response to a successful match. 

55. (original) The method of claim 52, wherein the patterns are found further using at 
least one auxiliary definition statement used to name and define a fragment of a pattern. 
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